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ABSTRACT 

Objectives: Observational work sampling is often used in occupational studies to assess categori- 
cal biomechanical exposures and occurrence of specific work tasks. The statistical performance of data 
obtained by work sampling is, however, not well understood, impeding informed measurement strategy 
design. The purpose of this study was to develop a procedure for assessing the statistical properties of 
work sampling strategies evaluating categorical exposure variables and to illustrate the usefulness of this 
procedure to examine bias and precision of exposure estimates from samples of different sizes. 

Methods: Prom a parent data set of observations on 10 construction workers performing a sin- 
gle operation, the probabilities were determined for each worker of performing four component tasks 
and working in four mutually exclusive trunk posture categories (neutral, mild flexion, severe flexion, 
twisted). Using these probabilities, 5000 simulated data sets were created via probability-based resam- 
pling for each of six sampling strategies, ranging from 300 to 4500 observations. For each strategy, mean 
exposure and exposure variability metrics were calculated at both the operation level and task level and 
for each metric, bias and precision were assessed across the 5000 simulations. 

Results: Estimates of exposure variability were substantially more uncertain at all sample sizes 
than estimates of mean exposures and task proportions. Estimates at small sample sizes were also 
biased. With only 600 samples, proportions of the different tasks and of working with a neutral trunk 
posture (the most common) were within 10% of the true target value in at least 80% of all the simulated 
data sets; rarer exposures required at least 1500 samples. For most task-level mean exposure variables 
and for all operation-level and task-level estimates of exposure variability, performance was low, even 
with 4500 samples. In general, the precision of mean exposure estimates did not depend on the expo- 
sure variability between workers. 

Conclusions: The suggested probability-based simulation approach proved to be versatile and 
generally suitable for assessing bias and precision of data collection strategies using work sampling to 
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estimate categorical data. The approach can be used in both real and hypothetical scenarios, in ergo- 
nomics, as well as in other areas of occupational epidemiology and intervention research. The reported 
statistical properties associated with sample size are likely widely relevant to studies using work sam- 
pling to assess categorical variables. 

KEYWORDS: epidemiology; ergonomics; exposure assessment methodology; precision; statistical 
efficiency; working postures 



INTRODUCTION 

Field assessments to quantify biomechanical exposures 
(physical loads) at work frequently employ observational 
methods to determine body postures, type of materials 
handling, and other ergonomic characteristics relevant for 
risk of musculoskeletal disorders (Li and Buckle, 1999; 
Denis et al, 2000; Takala et al, 2010). Many observation 
methods, such as the Ovako Working posture Assessment 
System (OWAS) (Karhu et al, 1977), the Task Recording 
and Analysis on Computer method (TRAC) (van der Beek 
et al., 1992), the Back-Exposure Sampling Tool (Back-EST) 
(Village et al, 2009), and several more (e.g. Hoogendoorn 
et al, 2000; Neumann et al, 200 1; Bao et al, 2009), are based 
on work sampling, with momentary observations, collected 
at either fixed or random time intervals. The resulting series 
of individual observations is then typically summarized in 
terms of proportions of time in predetermined categories, 
such as posture intervals or specific tasks. Time-based work 
sampling has been used for decades in industrial engineering 
(Richardson and Pape, 1982) and ergonomics (Dempsey 
and Mathiassen, 2006) as a tool to simultaneously assess 
the occurrence of tasks and work-related risk factors to mus- 
culoskeletal disorders. While observational methods based 
on continuous, event-based observation of biomechanical 
exposure are available (e.g. Punnett et al, 199 1; Christensen 
et al, 1995; Fransson-Hall et al, 1995; Fallentin et al., 2001; 
Dartt et al., 2009; Hooftman et al, 2009; Mathiassen and 
Paquet, 2010), work sampling was recently shown to be the 
more cost-efficient approach for observing working pos- 
tures (Rezagholi etal, 2012). 

Other types of occupational exposures may also be 
described with categorical variables, in particular the 
presence of workers in various chemical or acoustic 
environments and/or tasks (e.g. Preller et al, 1995; Susi 
et al, 2000; Neitzel et al, 20 1 1 ) . In both ergonomics and 
occupational hygiene, operations are often analyzed by 
task to identify sources of exposure as targets for inter- 
vention (Dempsey and Mathiassen, 2006). At the level 
of individual workers, the proportion of time in tasks 
can be used together with information on task-specific 



exposures to estimate job exposures. Such task-based 
exposure modeling has been used for both biomechani- 
cal (e.g. Burdorf eta/., 1997; Chen etal, 2004; Mathiassen 
et al, 2005; Svendsen et al, 2005; Bovenzi, 2009) and 
other occupational exposures (e.g. Benke et al, 2000; 
Harrison et al, 2002; Semple et al, 2003; Neitzel et al, 
2011). Correct information on task proportions is a 
prerequisite for these models to operate as intended, i.e. 
produce an unbiased estimate of the modeled exposure 
(Mathiassen et al, 2003a; Burstyn, 2009). 

In general, guidance is scarce on how to design 
an appropriate data collection strategy for estimating 
exposures for operations, tasks, or jobs of individual 
workers using observational methods (Takala et al, 
2010). This is a serious concern, considering that 
awareness and proper appreciation of the statistical 
properties and performance of the data collecting 
strategy used with a particular observation method is, 
arguably, at least as important to the interpretation of 
the resulting exposure data as is the basic validity and 
reliability of that method (Takala et al, 2010). 

For studies designed to assess mean exposures on a con- 
tinuous scale, the ability of an exposure sampling strategy to 
produce a correct exposure estimate, i.e. its statistical perfor- 
mance, can be assessed using information on exposure vari- 
ability in the target population and the size of the exposure 
sample (Samuels etal, 1985; Mathiassen et al, 2002, 2003a; 
Jackson et al, 2009; Liv et al, 2011). These algorithms are 
based on assumptions about the distribution of the under- 
lying exposures that are likely often not met for exposures 
measured on categorical scales (Mathiassen and Paquet, 
2010) and for exposure variables other than the mean, 
including exposure variability metrics (Liv et al, 2012). 
Simulation can be a viable alternative for assessing statisti- 
cal performance in such cases (Liv et al, 20 1 1 ) . Simulations 
can be based on expected exposure distribution parameters 
(Semple et al, 2003) or on resampling of empirical data 
sets, as in non-parametric bootstrapping (Burdorf and van 
Riel, 1996; Hoozemans et al, 2001; Paquet et al, 2005; 
Mathiassen and Paquet, 2010; Liv et al., 201 1, 2012). 
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The aim of the present study was to develop a 
general procedure for assessing the statistical perfor- 
mance of observational work sampling of categori- 
cal exposure variables and to use that procedure in a 
representative occupational scenario to gain a better 
understanding of the influence of sample size on bias 
and precision of estimates of variables expressing cen- 
tral tendencies and variability of task occurrence and 
of working postures at the level of operation and tasks. 

MATERIALS AND METHODS 

Parent data set 
Previously collected data using the PATH — 
Postures, Activities, Tools, and Handling — 
observation method to assess tunnel and highway 
construction (Tak et al, 2011) were utilized as the 
parent data set for this methodological study. PATH 
(Buchholz et al, 1996) is a work sampling tool to 
estimate biomechanically relevant exposure variables. 
PATH has primarily been used to provide exposure 
estimates at the operation level, although exposures 
can also be estimated for separate tasks and even indi- 
vidual workers. The PATH method is reproducible, 
given adequate training of observers (Park et al, 2009), 
and valid compared with the results of direct technical 
measurements (Paquet et al, 2001; Tak et al, 2007). In 
a recent review, PATH was rated as a 'thoroughly devel- 
oped' method with a 'systematic and well-designed 
sampling approach' (Takala et al, 2010). Thus, PATH 
serves as a suitable model for observational exposure 
assessment employing a work sampling approach. 

From the nine operations represented in the parent 
data set, we selected 'jacking pit construction' by labor- 
ers as a model of an operation performed by several 
workers over an extended period of time. Four com- 
ponent tasks occurred during the days observed: top 
work, pit wall construction, manual excavation, and 
other miscellaneous work (Paquet et al, 2005). The 
PATH observations of jacking pit construction were 
collected over 12 days spanning one calendar month. 
Observation periods ranged from 120 to 460 min 
day -1 . The same two analysts observed this operation 
on each day; on one day, a third observer was present. 

After excluding one worker with fewer than 40 obser- 
vations, the resulting data set comprised a total of 3103 
observations distributed among 10 workers (Table l). For 
the present paper, the primary biomechanical exposure 



of interest was trunk posture, which was recorded as a cat- 
egorical variable with four divisions: neutral (<20° flex- 
ion), mild flexion (between 20° and 45°), severe flexion 
(>45°), and twist (with or without flexion). Trunk pos- 
ture was selected because it is an important risk factor for 
back disorders (Punnett et al, 1991) and because non- 
neutral postures of the trunk were frequently observed in 
all of the operations represented in the large construction 
data set (Tak etal, 2011). 

Simulated sampling strategies 
Six data collection strategies were simulated to reflect 
different durations of sampling, ranging from 5 to 
75 h, with observations 'collected' at 1-min intervals. 
The six strategies included 300, 600, 900, 1500, 3000, 
and 4500 observations, roughly corresponding to 1, 2, 
3 full days and 1, 2, 3 full weeks of sampling. For each 
of the six strategies, simulated observations were gen- 
erated using a probability-based procedure with the 
following stepwise algorithm: 

i. A worker was randomly selected from the 
group of all 10 workers, all workers having 
equal probabilities of being selected. 

ii. A task was randomly determined for that 
worker based on the probabilities in the par- 
ent data set of that worker performing each 
of the four possible tasks. 

iii. An exposure for that worker performing that 
task was randomly determined based on the 
probabilities in the parent data set of that 
worker experiencing each of the four possible 
exposure levels when performing that specific 
task. 

This simulation procedure reflects and reproduces 
the multinomial structure of categorical PATH obser- 
vations at the three hierarchical levels of subjects (10 
categories), tasks within subject (four categories), and 
exposures within task and subject (four categories). Any 
individual observation will result in a positive ('yes') 
answer in exactly one of the possible categories at each 
of these levels, and the probabilities of obtaining a 'yes' in 
any category within the level naturally add up to 100%. 
Thus, samples of (independent) multiple observations 
are categorically distributed, with properties determined 
by the true outcome probabilities in the set of categories 
within the same level (cf. Appendix). 
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Table 1. PATH parent data by individual worker showing the total number of observations ( n*. s ), 
job-level trunkposture ( M^.s, % total time) 3 , proportional task occurrence ( Wts , % total time) b , and 
task-level trunk posture ( [Lets, % task time) c for each worker 
Variable Posture Worker 





1 


2 


3 


4 


5 


6 


7 


8 


9 


10 




294 


68 


272 


245 


273 


315 


248 


289 


393 


706 


Neutral 


74.8 


73.5 


83.1 


73.9 


68.9 


60.0 


54.4 


75.8 


77.9 


67.1 


Mild 


12.6 


13.2 


7.4 


13.5 


14.3 


17.8 


17.7 


13.5 


13.7 


20.7 


Severe 


10.2 


8.8 


3.7 


10.2 


12.1 


19.7 


24.2 


6.6 


6.9 


11.1 


Twisted 


2.4 


4.4 


5.9 


2.5 


4.8 


2.5 


3.6 


4.2 


1.5 


1.1 




19.1 


8.8 


17.7 


26.9 


75.5 


0.0 


0.0 


32.5 


50.4 


0.0 


Neutral 


89.3 


50.0 


72.9 


69.7 


69.4 






62.8 


70.1 




Mild 


7.1 


16.7 


12.5 


13.6 


15.1 






28.7 


15.7 




Severe 


1.8 


16.7 


12.5 


13.6 


12.1 






7.5 


11.6 




Twisted 


1.8 


16.7 


2.1 


3.0 


3.4 


— 




1.1 


2.0 






30.6 


51.5 


12.9 


16.7 


3.7 


33.0 


21.4 


2.8 


0.0 


22.8 


Neutral 


54.4 


71.4 


74.3 


75.6 


50.0 


43.3 


47.2 


62.5 




54.0 


Mid 


17.8 


14.3 


8.6 


12.2 


30.0 


26.9 


24.5 


12.5 




24.8 


Severe 


22.2 


11.4 


0.0 


12.2 


20.0 


28.9 


28.3 


12.5 




20.5 


Twisted 


5.6 


2.9 


17.1 


0.0 


0.0 


1.0 


0.0 


12.5 




0.6 


— 


40.8 


22.1 


42.7 


27.4 


0.0 


28.6 


65.3 


55.7 


37.4 


69.6 


Neutral 


84.2 


66.7 


84.5 


67.2 




37.8 


50.6 


80.1 


84.4 


70.3 


Mid 


12.5 


20.0 


7.8 


22.4 




24.4 


17.9 


6.8 


12.9 


19.1 


Severe 


2.5 


6.7 


3.5 


6.0 




32.2 


27.2 


6.8 


2.0 


9.2 


Twisted 


0.8 


6.7 


4.3 


4.5 




5.6 


4.3 


6.2 


0.7 


1.4 




9.5 


17.7 


26.8 


29.0 


20.9 


38.4 


13.3 


9.0 


12.2 


7.7 


Neutral 


71.4 


100.0 


91.8 


83.1 


70.2 


90.9 


84.9 


100.0 


87.5 


77.8 


Mid 


7.1 


0.0 


2.7 


5.6 


8.8 


5.0 


6.1 


0.0 


8.3 


22.2 


Severe 


21.4 


0.0 


0.0 


9.9 


10.5 


2.5 


3.0 


0.0 


2.1 


0.0 


Twisted 


0.0 


0.0 


5.5 


1.4 


10.5 


1.7 


6.1 


0.0 


2.1 


0.0 



Job level 



Top work 
task 



Pit wall 

construction 

task 



Manual 

excavation 

task 



it B .s 

w T s 

M-ETS 

w T s 

V-ETS 

Wts 



Mscellaneous Wts 
work tasks 

M-ets 



a Job exposure: \i e*s~ 
'Task proportion: Wts : 
Task exposure: \l ets = 



(% total time). 

s 

- (% total time) 

s 

L (% task time). 



n'„ s , total number of samples from subjects, across all tasks and exposure categories; n\ TS , number of samples from subject s performing task T, irrespective 
of exposure category; n s EtS , number of samples from subject s within exposure category E, irrespective of task; and n ETS, number of samples from subject s 
within exposure category E, while performing task T. 



Resampling at the level of individual observations 
as described above was repeated until a complete set 
of simulated data had been created, as dictated by 
the number of observations — from 300 to 4500 — 
required for each of the six specific data collec- 
tion strategies. For each strategy, 5000 such data 
sets were generated. Simulations were performed 



using a custom software program written in MatLab 
(MathWorks, Natick, MA, USA; code provided 
as Supplementary material, available at Annals of 
Occupational Hygiene online). Five thousand repeats 
have been considered a sufficient basis for analyz- 
ing distributions in previous simulation studies (e.g. 
Semple et al, 2003; Liv et al, 2012). 
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We selected this probabilistic resampling pro- 
cedure as opposed to non-parametric resampling 
with replacement from the parent data set (conven- 
tional bootstrapping; Burdorf and van Kiel, 1996; 
Hoozemans et al, 2001; Mathiassen and Paquet, 
2010) because it allows the probabilities assigned to 
the occurrence of workers, tasks, and exposures to be 
manipulated. This, in turn, allows scenarios differing 
from the one represented by the parent data set to be 
investigated, as illustrated by the assignment of equal 
selection probabilities for all workers in this study 
(step '\ in the algorithm above) even though they 
were, in fact, represented to different extents in the 
parent data (cf. Table 1, n s „ s ). 

Task and exposure variables 
For each simulated data set, eight summary expo- 
sure variables were then calculated. As defined and 
explained in Table 2, we examined the following: 

• Two operation-level variables: average opera- 
tion exposure in the group of workers (He..) 
calculated using a mean-of-means approach 
(Samuels et al., 1985), and variance between 
subjects in job exposure ( s 2 BS _ E . s ). 

• Four task-level variables (calculated for each of 
the tasks in the operation) : the relative occur- 
rence of the task in the operation ( w Tt ), 

the variance between subjects in task occur- 
rence ( Sbs^> ts )> the group mean task exposure 
( M-et.); and the variance between subjects in 
task exposure ( s 2 BS _ ETS )• 

• Two variables summarizing differences 
between the four tasks in the operation: task 
diversity ( MSA E ) and task contrast within 
the operation ( C E ). 

Since a complete set of variables was obtained for 
each simulated data set, 5000 sets of all eight expo- 
sure variables (some of them quadrupled, since the 
operation contained four tasks) were available for 
each of the six investigated sampling strategies. 

Sampling performance 
The mean and 5th-95th percentile range of the 
cumulative probability plots of the 5000 simulated 
values for each task and posture variable, category, 
and sampling strategy were calculated as summary 



measures of statistical performance, reflecting the 
precision of the resulting exposure estimate. As an 
additional measure of performance, the proportion of 
the 5000 values falling between 90 and 110% of the 
value in the parent set, i.e. a ±10% level of 'coverage 
probability' (Landon and Singpurwalla, 2008), was 
determined for all variables, categories, and sampling 
strategies. This metric reflects the ability of the sam- 
pling strategy to produce a result in close proximity 
to the 'true' target value, capturing the combined 
effects of bias and imprecision. The distributional 
properties of the cumulative probability plots were 
also examined visually. A custom software program 
was written in MatLab to calculate the eight task and 
posture variables and to assess statistical performance 
from the cumulative distributions, using the metrics 
described above. 

For the variables |i„ , w T ,, and )J. £T ., the cov- 
erage probabilities reflect results in each individual 
category, independent of other categories in the 
same set (e.g. in each posture category at the opera- 
tion level). However, they do not capture corre- 
lations — negative or positive — between results 
within a category set, which will be present due to 
the 'compositional' nature of data (Aitchison, 1986; 
Reimann et al., 2012), i.e. that results inherently 
add up to 100%. Thus, in order to measure the abil- 
ity of our sampling strategies to deliver results close 
to the truth in several categories (tasks or postures) 
simultaneously, we calculated 'compositional cover- 
age probabilities', i.e. the proportions of the 5000 
data sets under each strategy that returned a result 
between 90 and 1 10% of the parent data set value in 
0, 1, 2, 3, or all four categories. 

RESULTS 

Parent data set 

Exposures in the parent data set are summarized 
in the column 'Target' in Tables 3a (operation 
level), Table 4a (task level, miscellaneous work), 
and Supplementary Table SI a, available at Annals 
of Occupational Hygiene online (task level, other 
tasks). The 10 jacking pit construction workers 
spent, on average, 70.9% of their job time in neu- 
tral postures, 14.4% in mild flexion, 11.3% in severe 
flexion, and 3.3% in twisted trunk postures (H £ ..; 
Table 3a). The four component tasks, i.e. top work, pit 
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wall construction, manual excavation, and miscellane- 
ous work, comprised, on average, 23.1, 19.5, 38.9, and 
18.4% of the operation, respectively {w Tm ; Table 4a 
and Supplementary Table SI a, available at Annals of 
Occupational Hygiene online). The mean time across 



workers spent within a certain exposure posture cat- 
egory in each task ( jJ. £r .; Table 4a and Supplementary 
Table SI a, available at Annals of Occupational Hygiene 
online) ranged from 59.2 to 85.8% (neutral), 6.6 to 
19.1% (mild flexion), 4.9 to 17.3% (severe flexion), 



Table 3. (a) Operation-level and job-level exposure variables for all simulation strategies and 
the parent data set (column 'Target'). All cells but 'Target' show the mean value with 5th-95th 
percentile values from the simulated distributions located in square brackets below. For 
explanation of exposure variables see Table 2. (b) Coverage probabilities for operation and job- 
level exposure variables: percentage of simulations (n = 5000 for each strategy) in which the 
estimated exposure value was between 90 and 110% of the true target exposure value in the parent 
data set. Cells with 90% coverage or more highlighted in dark gray; cells with 80-89.9% coverage 
highlighted in light gray 



Variable 


Posture 


Simulated sampling strategy, number of samples 






Target 






300 


600 


900 


1500 


3000 


4500 




(a) 




Neutral 


70.9 


71.0 


71.0 


70.9 


70.9 


70.9 


70.9 




[66.6-75.3] 


[67.9-74.1] 


[68.4-73.4] 


[69.0-72.8] 


[69.5-72.3] 


[69.8-72.0] 






Mild 


14.5 


14.4 


14.4 


14.4 


14.4 


14.4 


14.4 






[11.2-17.8] 


[12.1-16.8] 


[12.5-16.4] 


[13.0-15.9] 


[13.4-15.5] 


[13.6-15.3] 






Severe 


11.3 


11.3 


11.3 


11.4 


11.4 


11.3 


11.3 






[8.4-14.4] 


[9.3-13.5] 


[9.6-13.0] 


[10.0-12.7] 


[10.4-12.3] 


[10.6-12.1] 






Twisted 


3.3 


3.3 


3.3 


3.3 


3.3 


3.3 


3.3 






[1.7-5.1] 


[2.1-4.5] 


[2.3-4.3] 


[2.5-4.1] 


[2.8-3.8J 


[2.9-3.8] 




2 

S BS-E*S 


Neutral 


140.6 


106.4 


94.6 


86.1 


79.7 


77.7 


73.4 






[59.9-243.8] 


[52.7-173.7] 


[51.6-146.5] 


[52.3-124.6] 


[55.5-107.7] 


[58.0-99.1] 






Mild 


55.0 


33.8 


26.9 


21.3 


17.2 


15.8 


13.2 






[20.3-102.8] 


[14.1-60.8] 


[11.7-47.1] 


[10.0-35.3] 


[9.6-26.3] 


[9.6-22.9] 






Severe 


70.5 


54.3 


48.9 


44.6 


41.5 


40.5 


38.4 






[27.0-131.3] 


[24.9-93.9] 


[24.5-79.7] 


[25.7-67.3] 


[27.7-57.0] 


[29.1-53.1] 






Twisted 


13.0 


7.7 


5.8 


4.5 


3.4 


3.0 


2.3 






[3.8-27.5] 


[2.7-15.5] 


[2.2-11.3] 


[1.8-8.2] 


[1.6-5.7] 


[1.6-4.9] 




MSA E 


Neutral 


124.7 


109.5 


103.7 


98.1 


94.9 


93.4 


90.9 






[26.9-255.2] 


[34.4-208.6] 


[40-188.2] 


[50.0-160.2] 


[60.1-134.9] 


[65.7-125.1] 






Mid 


42.9 


33.5 


30.1 


26.7 


24.0 


23.2 


21.7 






[6.4-104.6] 


[6.3-78.7] 


[7.4-66.0] 


[9.4-51.8] 


[12.6-39.0] 


[13.7-34.6] 






Severe 


37.4 


29.3 


26.1 


23.1 


21.4 


20.7 


19.3 






[5.1-97.8] 


[4.7-76.5] 


[5.8-64.9] 


[7.0-49.9] 


[9.6-37.6] 


[11.3-32.5] 






Twisted 


8.4 


4.9 


3.5 


2.3 


1.4 


1.0 


0.5 






[0.7-30.0] 


[0.4-17.7] 


[0.3-11.7] 


[0.2-6.6] 


[0.2-3.7] 


[0.1-2.7] 




c E 


Neutral 


0.21 


0.24 


0.27 


0.30 


0.34 


0.35 


0.35 






[0.05-0.40] 


[0.09-0.42] 


[0.12-0.43] 


[0.17-0.44] 


[0.23-0.45] 


[0.26-0.44] 






Mid 


0.15 


0.18 


0.20 


0.24 


0.28 


0.30 


0.32 






[0.02-0.32] 


[0.04-0.34] 


[0.06-0.37] 


[0.09-0.40] 


[0.15-0.42] 


[0.18-0.43] 






Severe 


0.14 


0.15 


0.16 


0.18 


0.21 


0.22 


0.22 






[0.02-0.30] 


[0.03-0.30] 


[0.05-0.31] 


[0.07-0.32] 


[0.10-0.32] 


[0.12-0.32] 






Twisted 


0.09 


0.07 


0.07 


0.06 


0.04 


0.04 


0.02 






[0.02-0.19] 


[0.01-0.17] 


[0.01-0.15] 


[0.01-0.12] 


[0.01-0.09] 


[0.01-0.08] 
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Table 3. Continued 



Variable Posture Simulated sampling strategy, number of samples Target 



(b) 
Re.. 



2 



MSA E 



C E 



300 600 900 1500 3000 4500 



NTpnfral 

liCULl ill 


99 4 

yy-^ 


100.0 


100.0 


100.0 


100.0 


100.0 


TV J-l 1 

Mud 


51.2 


68.3 


77.8 


89.0 


97.4 


99.2 


Severe 


46.8 


63.1 


72.3 


83.1 


94.9 


98.1 


Twisted 


24.6 


33.6 


41.5 


52.2 


68.5 


78.6 


Neutral 


6.9 


12.5 


18.4 


25.0 


35.3 


42.4 


Mild 


0.6 


3.0 


5.4 


9.2 


18.0 


22.7 


Severe 


8.3 


15.4 


18.0 


23.5 


33.2 


40.3 


Twisted 


0.5 


1.9 


4.1 


7.3 


14.3 


18.5 


Neutral 


9.8 


14.5 


17.2 


21.7 


31.4 


38.7 


Mild 


7.4 


10.4 


11.3 


15.4 


22.4 


27.3 


Severe 


8.0 


10.4 


12.3 


13.9 


20.6 


24.4 


Twisted 


0.8 


1.6 


2.3 


3.2 


5.5 


6.3 


Neutral 


9.4 


14.4 


18.9 


26.2 


40.4 


47.2 


Mild 


5.8 


8.4 


11.0 


16.3 


25.1 


28.6 


Severe 


10.5 


13.5 


14.6 


18.2 


24.6 


27.2 


Twisted 


1.7 


2.9 


3.7 


4.5 


6.1 


6.9 



■ 



and 2.7 to 4.4% (twisted). Tasks differed most with 
respect to neutral postures (MSA E ; Table 3a). It 
should be noted that the values of MSA E are some- 
what inflated due to exposure variability within each 
of the four tasks. Task contrasts within the operation 
corroborated that the tasks differed most consistently 
in the occurrence of neutral and mild trunk postures 
( C E ; Table 3a). 

Individual workers differed considerably in 
how often they were observed in each of the 
four posture categories, particularly for neu- 
tral and severe trunk flexion (sg S _ EmS ; Table 3a), 
as well as in the relative proportion of time they 
spent performing each of the four tasks ( Sg S _ WTS ; 
Table 3a). Even within a task, postures differed con- 
siderably between subjects (sg S _ ETS ; Table 4a and 
Supplementary Table SI a, available at Annals of 
Occupational Hygiene online), with the caveat that 
these variabilities include contributions from within- 
subject variability (between and within measurement 
days), which could not be isolated and adjusted for. 

Statistical performance: operation level 

For all sampling strategies, operation exposure,(X £ .., 
was estimated without bias relative to the 'true' 



target (Table 3a). This is illustrated by the alignment 
of the inflection points of the simulated data disper- 
sion curves and the line indicating the target value in 
Fig. la. As expected, the 5th-95th prediction interval 
decreased with increasing sample size ( |J. £ ..; Table 3a, 
values in square brackets) . This narrowing is illustrated 
in Fig. la by the decreased dispersion of the 5000 sim- 
ulated data sets at larger sample sizes. 

Coverage probability depended on the true occur- 
rence of the postural exposure in the parent data set 
(Table 3b). For example, for neutral trunk posture, 
which was the most frequently observed, a high level 
of coverage probability (99.3%) was shown even for a 
sampling strategy with only 300 observations. In con- 
trast, for the most rarely observed posture, twisting, 
even a sampling strategy containing 4500 observa- 
tions led only to a 78.6% probability of producing an 
operation exposure estimate within ±10% of the target 
value. Compositional coverage probabilities for pos- 
tures at the operation level increased with sample size 
(Fig. 2a). With 300 observations, about one of every 
five data sets contained values deviating more than 
10% from the target value in three or all four posture 
categories. Even with 4500 observations, fewer than 
80% of all data sets showed values within ±10% of 
the target in all four categories. These compositional 
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coverage probabilities deviated by ±5 percentage 
points or less from the probabilities predicted under 
the assumption that the coverage in each category was 
independent of that in the other categories in the same 
set. As an example, the predicted compositional cov- 
erage probability under the 300 observation strategy 
of getting a value close to the truth in all four posture 
categories at the operation level was (cf. Table 3b) 
0.994-0.512.0.468-0.246-100%, i.e. 5.9%, whereas the 
actual compositional coverage was 5.7% (Fig. 2a). 

In contrast to the unbiased mean exposure, esti- 
mates of between-subject variability were upwardly 
biased for all sampling strategies; shorter duration 
samples were more severely biased and also showed 
wider prediction intervals ( Sbs-e»s > Table 3a). This 
is illustrated by the horizontal curve shifts for smaller 
sampling strategies in Fig. lb. The effect of sample 
size on bias may result primarily from the fact that 
the variance of the individual exposure mean values 
included in the estimate of s BS _ E , s is larger for smaller 
sample sizes and thus inflates the value of s BS _ E , s to 
a larger extent. Coverage probability was much lower 
for estimates of between-subject variability than that 
for mean exposures. In the best case, i.e. for neutral 
trunk postures, coverage probability for s BS _^, s never 
exceeded 42.4%, even with 4500 samples (Table 3b). 
In the worst case, that of twisted postures assessed by 
the 300 observations strategy, only 0.5% of the 5000 
simulated data sets fell within +10% of the target 
s bs-e*s value, due to the combined effect of bias and 
imprecision. 

A similar pattern of larger bias and wider prediction 
intervals at smaller sample sizes was also observed for 
MSA E (Table 3a, Fig. lc). Again, increased bias was 
likely due to the effect of within-subject variability on 
task exposure estimates. Notably, this means that with 
small sample sizes, task exposures appear to be more 
different from each other than they really are. Even 
at the largest sample size (here 4500 observations), 
MSA E did not converge completely to the target value. 

By definition (cf. Table 2), the effect of sample 
size on contrast, C E , will be a trade-off between 
the effects on MSA „ and on within-task vari- 



ability, 



In the present material, both 



decreased with increasing sample size (for s ss _. 



-ETS> 



see below). In most cases, the net effect was that 
C E was underestimated at small sample sizes but 
increased toward the target value with larger sample 



Statistical performance of observational work sampling • 305 




) 900 300 
1500 600 



300 



Group mean exposure, % Variance between workers, % 2 




Task diversity, % 2 Task contrast 



1 Simulated cumulative distributions of variables describing exposure to severe trunk flexion at the operation level: 
(a) group mean exposure, M-e .. ; (b) variance between workers, S^ s _ £ . s ; (c) task diversity (mean squared deviation 
between task exposures), MSA E ; (d) exposure contrast between tasks, C E . Each panel shows the distribution of the 
5000 simulated results obtained by each of the six investigated observation strategies, from left to right as indicated by the 
legend in the upper right corner (colored online). Dashed vertical lines indicate the target value read from the parent data 
set (Table 3a, column 'Target'). 



sizes (Table 3a, Fig. Id). For twisted postures, how- 
ever, the opposite was seen: C E was overestimated 
at small sample sizes and decreased toward the tar- 
get with larger sample sizes. In general, the effect of 
an increasing sample size on the prediction inter- 
val was not as pronounced for C E as for the other 
exposure variables, so coverage probability did not 
improve as markedly with larger sample sizes. For 
example, coverage probability for C E in severe flex- 
ion only increased from 10.5 to 27.2% as the number 
of samples increased from 300 to 4500 (Table 3b). 

Statistical performance: task level 
The mean proportion of time spent performing each 
individual task ( w Tm ; Table 4a and Supplementary 
Table Sla, available at Annals of Occupational Hygiene 



online) was estimated without bias, even when 
using the smallest sampling strategy. The prediction 
interval decreased with increased sample size, as 
expected. Compositional coverage probability was 
better for the set of task proportions (Fig. 2b) than 
for postures (Fig. 2a), as might be expected from the 
larger coverage probabilities for each individual task 
proportion (Table 4b, Supplementary Table Sib, 
available at Annals of Occupational Hygiene online). 
Already with only 300 observations, <1 of every 10 
data sets contained task proportions deviating more 
than 10% from the true target value for three or 
all four tasks (Fig. 2b), and essentially all data sets 
containing 4500 observations resulted in task pro- 
portions within 10% from the target value for all 
four tasks. 
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1000 2000 3000 

Number of observations 



1000 2000 3000 

Number of observations 




2000 3000 4000 

Number of observations 

2 Effect of sample size on compositional coverage probabilities for (a) operation-level mean exposure; (b) task 
proportions; and (c) task-level mean exposure in miscellaneous work. Curves show proportions of the 5000 simulated 
data sets at each sample size that included values within 10% from the true target in at least I, at least 2, at least 3, or 
all 4 categories as indicated by the legend in the upper right corner of each panel (colored online). Posture categories: 
neutral, mild, severe, twisted; tasks: top work, pit wall construction, manual excavation, miscellaneous work. Diagrams 
corresponding to panel (c) for the other three tasks are shown in Supplementary Figure S 1 (available at Annals of 
Occupational Hygiene online). 



In general, even the mean exposure value for indi- 
vidual tasks ( )X £r .; Table 4a, Supplementary Table 
SI a, available at Annals of Occupational Hygiene 
online) was unbiased for all six sampling strate- 
gies and also demonstrated a decreasing predic- 
tion interval with increasing sample size. Coverage 
probability for task mean exposure estimates, (X ET . 
(Table 4b, Supplementary Table Sib, available at 
Annals of Occupational Hygiene online), was, in gen- 
eral, considerably lower than for the corresponding 
operation level mean exposures, |J, B .. (Table 3b), 
mainly because task exposure estimates were based 
on fewer samples and thus were less precise. This 
also led to compositional coverage being consid- 
erably lower for task mean exposures (Fig. 2c, 



Supplementary Figure SI, available at Annals of 
Occupational Hygiene online) than for operation 
mean exposure (Fig. 2a). For instance, with 4500 
samples, exposures were within ±10% of the tar- 
get value in all four posture categories for 76.8% of 
the assessments of operational exposure (Fig. 2a), 
whereas only 7.5% of the compositional exposure 
estimates in miscellaneous work were within ±10% 
(Fig. 2c). For between-subject variability in task 
occurrence ( s BS ^, TS ) and task exposure ( s BS _ ETS 
), pronounced bias and wide prediction intervals 
were present at smaller sample sizes, in particular 
for Sg S _ ETS in the three 'rarer' tasks. Thus, the cov- 
erage probability for s BS ^ TS and s BS _ ETS was low in 
these three cases (Table 4b, Supplementary Table 
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) 900 300 
1500 600 



Group mean exposure, % Variance between workers, % 2 



3 Simulated cumulative distributions of variables describing exposure to severe trunk flexion in the task miscellaneous work: 
(a) group mean exposure, |I £r .; (b) variance between workers, S BS _ ETS .Each panel shows the distribution of the 5000 
simulated results obtained by each of the six investigated observation strategies, from left to right as indicated by the legend in 
the upper right corner (colored online). Dashed vertical lines indicate the target value read from the parent data set (Table 4a, 
column 'Target'). 



Sib, available at Annals oj Occupational Hygiene 
online). 

Distributional properties of exposure estimates 

For some exposure estimates in rare tasks at shorter 
sampling strategies, the shape of the cumulative distri- 
bution across the 5000 simulated values differed from 
that found for the same exposure variable with larger 
strategies. For example, in the miscellaneous work 
task for the 300-sample strategy, the median estimated 
exposure to severe flexion was smaller than the median 
obtained with larger sample sizes (Fig. 3a), and severe 
flexion did not even occur in approximately 8% of 
the 5000 data sets, as shown by the positive j-inter- 
cept value in Fig. 3a. Under the 300-sample strategy, 
an average of 55 samples (18.4%; w T . , Table 4a) are 
expected to come from miscellaneous work. Since 
only 4.9% ( \Jl et ., Table 4a) of these 55 samples are 
expected to show severe trunk flexion, a data set with 
300 samples will, occasionally, contain no observa- 
tions of severe flexion from any of the workers per- 
forming the task. In this case, task exposure to severe 
flexion at the group level will be zero and so contribute 
to the observed positives-intercept. 

With the 300-sample strategy, the cumulative 
distribution of severe trunk flexion in miscellane- 
ous work also appeared slightly jagged (Fig. 3a). 
This discrete graphical pattern is even more appar- 
ent in Fig. 3b, which shows the task-specific, 
between- worker exposure variability, Sg S _ ETS , 



for severe trunk flexion during miscellaneous work. 
Under the 300-sample strategy, five to six simulated 
observations on average from each individual worker 
(i.e., 55/10) will be in miscellaneous work. Thus, each 
observation (sample) will account for -20% of the 
time in the task and influence the task exposure of that 
worker accordingly. The jagged distribution in Fig. 3b 
indicates that the presence or absence of single sam- 
ples for each particular worker changes the size of the 
exposure variability in a stepwise fashion. This pattern 
is also visible at sample sizes larger than 300, albeit to 
a lesser extent. 

DISCUSSION 

Statistical performance of work sampling for 
categorical variables 
The primary aim of this study was to develop a pro- 
cedure for investigating the statistical properties of 
selected task and exposure variables, estimated from 
data obtained by different observational work sam- 
pling strategies. The proposed probabilistic simulation 
approach proved useful for disclosing sampling per- 
formance when assessing variables that are difficult to 
access using analytical methods, and resembles in this 
respect previous simulation studies of precision and 
bias associated with exposure estimation (e.g. Semple 
et at, 2003; Liv et ah, 2011). Although the empirical 
illustration based on a large parent data set of PATH 
observations served primarily to demonstrate the abil- 
ity of the probabilistic simulation to produce results 
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of generic relevance to the design of data collection 
strategies, we also believe the exposure structure of 
the parent data set to be representative of many cat- 
egorical data sets of occupational postures obtained 
using work sampling because all workers performed 
all tasks and experienced all exposures at least occa- 
sionally. Thus, essential results, such as the statistical 
properties of variability metrics compared with those 
of mean values, or of variables at the level of tasks com- 
pared with at the level of operation, may be generally 
applicable to most occupational exposure assessment 
efforts. Therefore, we discuss these results more in 
detail below, even if numerical values may be specific 
to the present parent data. 

The precision of mean exposure estimates at both 
the operation and task levels ( jX E .. and \l ET ,, Tables 
3a and 4a) improved with increasing sample sizes 
from 300 to 4500 observations. This result is consistent 
with several other studies addressing the relationship 
between sample size and uncertainty in estimating mean 
exposures (Burdorf and van Kiel, 1996; Hoozemans 
et al, 2001; Mathiassen et al, 2002; Fethke et al, 2007), 
as well as a previous analysis of PATH performance in 
different operations (Paquet et al, 2005). Thus, our 
study corroborates that highly prevalent exposures 
may be determined reasonably correctly even with 
relatively small sample sizes, in casu, 300 observations 
(Table 3a), but that less common exposures, in particu- 
lar if measured at the task level, may be estimated with 
considerable uncertainty even with very large total sam- 
ple sizes, in casu 4500 observations (Tables 3a and4a, 
and Supplementary Table SI a, available at Annals of 
Occupational Hygiene online) . PATH is unlikely to differ 
much in this respect from other observational methods 
for assessing categorical variables by work sampling, 
such as Back-EST (Village et al, 2009), TRAC (van der 
Beek et al, 1992; Frings-Dresen and Kuijer, 1995; van 
der Beek et al, 1995), or OWAS (Karhu et al, 1977; 
KiviandMattila, 1991). 

The finding that increased sample size improved 
statistical performance was, of course, expected. The 
variance of an estimated mean value of a continuous 
variable will decrease in inverse proportion to the 
number of samples, provided that the data are ran- 
domly distributed (Samuels et al, 1985). In the pre- 
sent case, each basic unit of measurement, i.e. each 
single PATH observation, is an array of multinomial 
sets of 'yes' or 'no' answers to whether any particular 



worker is observed at that very instant, whether he 
is performing each specific task, and whether the 
exposure is within each specific posture category. 
Thus, data in any particular category are binomially dis- 
tributed, with the probability of the answer 'yes' being 
defined by the overall proportion of samples in that 
category in the parent data set. With large sample sizes, 
mean values of binomial variables approach normal 
distributions and so behave statistically as a continu- 
ous variable, with properties that can be approximated 
using analytical methods (cf. Appendix). In the present 
case, binomial theory led to an expectation that mean 
values of task proportions and postures would be unbi- 
ased and normally distributed with a variance directly 
computable from the workers' individual exposures 
and the total number of samples (Appendix). 

This theoretical prediction, however, is condi- 
tional on collected data sets being large and bal- 
anced. In the present case, theoretical expectations, 
such as the size of prediction intervals (cf. Tables 3a 
and4a, and Supplementary Table SI a, available at 
Annals of Occupational Hygiene online), were, indeed, 
met for most sampling strategies at the operation 
level and for both task proportions and posture vari- 
ables. However, at the task level, several deviations 
from expected performance were observed. For 
instance, using the 300-sample strategy, the 5th- 
95th prediction interval for severe flexion in miscel- 
laneous work (\l ET ,; Table 4a) was larger (12.0%) 
than predicted by theory (9.1%) and also somewhat 
skewed (0.0-12.0%) around the mean value of 5.0%. 
In this case, only about three observations of severe 
flexion in miscellaneous work will be available in a 
complete 300-sample data set (4.9 of 18.4% of 300; 
cf. Table 4a), and they are not likely to always be 
equally distributed among different workers. Thus, 
the data available for estimating exposure to severe 
flexion in this task are neither 'many' nor balanced. 
Other examples of irregular distributions were 
shown in Fig. 2. The occasional discrepancy between 
theoretical and empirical (simulated) performance 
illustrates an important use of the probabilistic simu- 
lation approach, i.e. to show when assumptions are 
no longer met in analytical models. In extension, 
simulations are also very attractive when address- 
ing distributions of variables that cannot be readily 
addressed by theoretical models, such as exposure 
variability metrics (Liv et al, 2012). 
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Both theoretical equations (Appendix) and 
simulation results draw attention to the fact that 
the uncertainty of mean exposure estimates did 
not relate directly to the size of between-subject 
variability in the exposure. As an illustration, the 
variability between workers in task proportions 
(sg S _ w , Table 4a) was much larger for most tasks than 
the variability in job exposures (,Sg S _ M , s , Table 3a) and 
the 5th-95th prediction intervals for task proportions 
and operation exposures were of the same order of 
magnitude. This counter-intuitive property of the pre- 
sent work sampling is a result of the sampling being 
performed in a finite population, which — in large 
data sets — eliminates the contribution of between- 
worker variability to the variance of group mean val- 
ues (cf. Appendix), and of the fact that the average 
within-subject variability will be less if workers differ 
substantially in mean exposure than if they are more 
homogeneous. 

Consistent with previous simulations examin- 
ing continuous exposure variables (Liv et al, 2012), 
uncertainty (i.e. the 5th-95th prediction intervals) 
was considerably larger for variables describing vari- 
ability than for mean values at the same sample size. 
Prediction intervals for variance estimates were also 
upwardly skewed with respect to the mean, particu- 
larly at small sample sizes (Tables 3a and4a). As a 
combined effect of bias and low precision, coverage 
probabilities for variability metrics were, in general, 
poor; in some cases with small sample sizes, they were 
<5%, even at the operation level (Table 3b). 

The effect of increased sample size on metrics 
expressing aspects of exposure variability ( Sg S _ E , s , 
MSA E ,s 2 BS ^ TS ,s 2 BS _ ETS ,C E ) is more difficult to 
predict than the effect on mean values (|X £ .., |X £T ., 
w T .), and considerably less literature has been 
devoted to the statistical performance of such variabil- 
ity metrics (Mathiassen et al., 2002, 2003b; Liv et al, 
2012). In the present material, most of these variables 

s bs-w TS > s bs-ets ) were upwardly 
biased at all sample sizes and particularly at smaller 
sample sizes (Tables 3a and4a, and Supplementary 
Table SI a, available at Annals of Occupational Hygiene 
online) . We believe this to result mainly from the larger 
uncertainty at smaller sample sizes of both within- 
subject and within-task variability. Within-subject 
variability was present to different extents for all 10 
observed workers, as suggested by their individual 



exposure profiles (Table l), but again within-days and 
between-days contributions could not be separated. 

As noted above, exhaustive sets of mutually 
exclusive categories lead to compositional results, 
i.e. constrained data that inevitably add up to a cer- 
tain number, in casu, 100%. This constrained nature 
of categorical data was reflected and reproduced by 
the resampling procedure employed for simulating 
new data sets. Comprehensive use of compositional 
data, for instance in hypothesis testing or regression 
analysis, requires specific procedures differing from 
conventional Euclidian algebra (e.g. Fry et al., 2000; 
Filzmoser et al., 2009; Filzmoser et al., 2010; Reimann 
et al., 2012), which fall outside the scope of this study. 
However, the reported compositional coverage prob- 
abilities, measuring the ability of a sample estimate to 
show values 'close' to the truth in an entire category set, 
revealed that probabilities within a category set, such 
as the four posture categories in operation exposure, 
were, indeed, correlated. However, discrepancies were 
small between empirical compositional coverage prob- 
abilities (Fig. 2, Supplementary Figure SI, available at 
Annals of Occupational Hygiene online) and values pre- 
dicted with the assumption of independence among 
categories. This suggests that compositional coverage 
can be fairly well estimated on the basis of binomial 
theory (cf. Appendix). 

How many observations are sufficient? 
Although we have demonstrated that larger observa- 
tion samples lead to better statistical performance, 
if in different ways and at different rates for different 
posture variables, we have deliberately avoided using 
the term 'sufficient' for any particular level of perfor- 
mance. A number of previous studies have, indeed, 
identified certain sample sizes as 'sufficient', 'enough, 
'adequate', or leading to 'reliable' results (Burdorf and 
van Fuel, 1996; AUread et al., 2000; Hoozemans et al., 
2001; Paquet et al, 2005; Fethke et al, 2007; Trask 
et al, 2008). The criterion has, however, often been 
largely arbitrary and based on the premise that preci- 
sion will not improve to any notable extent beyond 
this 'sufficient' sample size. Other studies have, on 
more formal grounds, identified the necessary sample 
size to obtain a specific precision of a mean exposure 
estimate (Mathiassen et al, 2003b), the necessary 
study size to obtain 'acceptable' power in studies com- 
paring independent groups (Mathiassen et al, 2002), 
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or the necessary number of subjects or samples when 
testing interventions using individuals as their own 
control (Mathiassen et al, 2003b; Mathiassen and 
Paquet, 2010). 

Design requirements for an exposure data collec- 
tion will also differ profoundly depending on whether 
the study is, e.g. devoted to documenting exposures in 
a specific occupational setting (as the present study), 
comparing mean exposures between groups or condi- 
tions (Mathiassen et al, 2002; Mathiassen and Paquet, 
2010), comparing exposures to threshold limit values 
(Lyles and Kupper, 1996), determining exposure-out- 
come relationships using either an individual-based or 
a group-based approach (Burdorf, 1995; Seixas and 
Sheppard, 1996; Tielemans et al, 1998; Nordander 
et al, 2004) or estimating sources and sizes of expo- 
sure variability (Eliasziw and Donner, 1987; Liv 
et al, 2012). For each of these study types, the neces- 
sary sample size will further differ depending on the 
choice of summary statistics and the distribution of 
the selected exposures, as demonstrated in the present 
study (Tables 3b and 4b) and numerous other studies 
showing that variability within and between subjects 
differs among exposure variables (e.g. for working pos- 
tures: van der Beek et al, 1995; Burdorf and van Kiel, 
1996; Mathiassen et al, 2003b; Hansson et al, 2006; 
Bao et al, 2009; Dartt et al, 2009; Wahlstrom et al, 
2010). 

Thus, the required statistical performance in any 
exposure data collection strategy, and hence the nec- 
essary sample size, is specific to the purpose, context, 
variables, and desired sensitivity of that particular 
study for which the sampling is carried out. In the pre- 
sent study, which focused on a descriptive documenta- 
tion of exposures in a specific construction operation, 
300 PATH observations would be sufficient for obtain- 
ing an 80% probability that the resulting estimate of 
the occurrence of neutral trunk postures is within 10% 
from the correct value (Table 3b). The same cover- 
age probability for operation exposures to mild and 
severe trunk flexion would require 1200 and 1500 
samples, respectively, while an assessment of twisted 
postures would not reach 80% coverage probability 
even with 4500 samples. All variables describing expo- 
sure variability between subjects and tasks (Table 3b) 
showed coverage probabilities below 80% even with 
4500 samples, so even larger samples — probably in 
excess of what can be practiced in many occupational 



studies — would be needed to reach a satisfying per- 
formance. For all four tasks, proportions could be 
determined with sufficient coverage probability by 
900 samples (Table 4b), whereas only few task expo- 
sures and no task exposure variabilities reached suf- 
ficient coverage probability even with 4500 samples. 

Although tentatively providing these guidelines for 
sampling in occupational settings and for metrics sim- 
ilar to those represented here, we wish to emphasize 
that prediction intervals will, for mathematical rea- 
sons, reach a width of zero (i.e. perfect precision) only 
at an infinite number of samples. Thus, 'saturation' or 
convergence to the 'true' value will never occur in a 
pure statistical sense. From a practical point of view, 
however, the return, in terms of improved precision 
with an increased number of samples, may, at some 
point, decrease below what is considered reasonable 
from a resource consumption perspective. A certain 
level of imprecision may even be deemed acceptable, 
and additional sampling beyond what is needed to 
achieve this statistical performance is then of limited 
value. 

Understanding categorical work sampling and 
probabilistic simulation 

A number of studies have addressed aspects of sam- 
pling performance for exposure variables measured 
on a continuous scale, using analytical expressions 
based on variance components (e.g. Mathiassen et al, 
2003a, 2003b; Kazmierczak et al, 2006; Lampa et al, 
2006; Chen et al, 2009; Jackson et al, 2009). A large 
majority of these studies have been devoted to under- 
standing the precision of mean exposure estimates. To 
our knowledge, very few attempts have been made to 
analyze statistical sampling performance for variables 
describing exposure variability, let alone task diver- 
sity or task contrasts. Standard analytical methods 
are applicable only for normally distributed and inde- 
pendent data, including data that can, via a suitable 
transformation, reach normality, which is standard 
practice for most exposure assessments in the field of 
occupational hygiene (Loomis and Kromhout, 2004). 
Although some violations of assumptions in standard 
analysis methods can be handled by modified statisti- 
cal models, any analytical approach is limited to expo- 
sure variables for which sampling performance can be 
expressed in a closed-form equation, typically expres- 
sions of central tendencies (mean values). 
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We chose to investigate sampling performance for 
variables expressing exposure variability on the basis 
of virtual data sets obtained by a simulation procedure, 
rather than developing an analytical approximation. 
Our approach of using the probabilities of task and 
exposure occurrence for each worker as observed in 
the original PATH observation data set is an example 
of parametric simulation where data units are assumed 
to follow a known distribution, and virtual data sets 
are created by randomly selecting values from that dis- 
tribution with a predetermined setup of parameters 
(Semple et al, 2003). In the case of multinomial data, 
this parametric simulation maybe particularly appeal- 
ing since the distribution is fully characterized by the 
set of probabilities of 'positive' ('yes') outcomes in 
the categories within a set. As an alternative, we could 
have used non-parametric bootstrap resampling with 
replacement among the 3103 observations in the 
parent data set (Efron and Tibshirani, 1986; Burdorf 
and van Kiel, 1996; Mathiassen and Paquet, 2010; 
Liv et al, 2011). We chose not to do so because of its 
highly unbalanced structure: the 10 workers were rep- 
resented to highly different extents, ranging from 68 to 
706 data points (Table l). To mimic a scenario where 
all workers worked full time and were equally avail- 
able for observation, we assigned all workers an equal 
probability of being selected on any single occasion. 

This decision also illustrates an attractive property 
of probabilistic simulation compared with straightfor- 
ward bootstrap resampling with replacement, namely, 
that the probabilities assigned to the occurrence of 
different subjects and exposures can be manipulated 
so as to explore hypothetical scenarios that differ 
from the one represented by the parent data set. For 
instance, in a reorganization of an operation, work- 
ers may be assigned new proportions of constituent 
tasks, even if the overall occurrence of each task in the 
job does not change. Virtual redistributions of tasks 
among workers can easily be simulated by changing 
individual values of task proportions while maintain- 
ing the original values of overall task occurrence, w t* 
, and the overall mean job and task exposures \X E „ 
and (J. BT . . Additional scenarios accessible to proba- 
bilistic simulation are the introduction of new tasks 
and changes in individual exposures, e.g. following 
from an ergonomic intervention. Even alternative 
scenarios referring to the logistics of exposure assess- 
ment may be considered by probabilistic simulation. 



For instance, some workers may not be accessible 
for observation during the whole period of data col- 
lection because they occasionally work in a location 
where observations are not feasible. This situation 
can be simulated by manipulating the probabilities 
of individual workers being selected for observation. 
Also the number of accessible workers sharing the 
tasks in the operation can be changed. We encourage 
more studies of data collection strategies using proba- 
bilistic simulation, especially for categorical data with 
binomial or multinomial distributions. 

When the measurement method itself contrib- 
utes uncertainty, which is inevitably true in observa- 
tional studies (Denis etal, 2000; Takala et al, 2010), 
the variance on the eventual exposure estimate will 
include a methodological contribution. In our par- 
ent data set, variability within and between observ- 
ers could not be distinguished from other sources of 
variability, and thus, it was not possible to determine 
the specific effects of between-observer and within - 
observer variability on the overall performance of 
the investigated sampling strategies. Other studies 
have, however, shown that between-observer reli- 
ability is good when PATH observations are made 
by trained observers (Park et al, 2009), as in the 
present parent data set. With other observational 
methods and/or for other variables than those 
observed in PATH, observer variability has been 
shown to contribute significantly to the uncertainty 
of the eventual exposure estimate (Kazmierczak 
et al, 2006; Dartt et al, 2009; Rezagholi et al, 2012; 
Mathiassen et al, 2013). We therefore recommend 
that the effects of variability between and within 
observers be specifically addressed in future studies 
of strategies for observing categorical variables. 

Finally, a further step in optimizing study designs 
would be to include considerations to the cost of dif- 
ferent data collection strategies that lead to a satisfy- 
ing statistical performance (Mathiassen and Bolin, 
2011; Rezagholi et al, 2012; Mathiassen et al, 2013). 
Methods for cost-efficiency analyses are, in general, 
still in their infancy (Rezagholi and Mathiassen, 
2010) and have so far been based only on analytical 
estimations of sampling performance. Analyses of 
cost-efficient exposure assessment using simulations, 
including specific investigations of categorical posture 
data obtained by work sampling, is a challenging issue 
for further research. 
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CONCLUSIONS 

The present study proposed a novel probabilistic simu- 
lation approach for categorical data and used it to reveal 
the statistical performance of observations of tasks 
and trunk postures obtained using a work sampling. 
Performance improved with increasing numbers of 
samples from 300 up to 4500. At each particular sample 
size, mean exposures were, in general, estimated with 
considerably better precision than variables describing 
aspects of exposure variability between workers and 
tasks; estimates of exposure variability were also biased 
at small sample sizes. Even with 4500 samples, varia- 
bles describing exposure variability were not estimated 
with satisfying coverage probability, neither at the level 
of individual categories nor for compositional sets of 
categories. The simulation approach thus proved use- 
ful for examining the performance of alternative expo- 
sure assessment strategies, and we also claim that it can 
be used to explore hypothetical scenarios of exposure 
profiles, task occurrences, and access to workers when 
collecting data. 

We believe that these results have a bearing in 
general on occupational exposure assessment studies 
where data are recorded in a categorical form. When 
planning such a study, an analysis — to the extent pos- 
sible — of the performance of different design options 
is key to arriving at a data collection that can effectively 
provide information of the desired quality, as defined 
by the purpose of the study. The probabilistic simula- 
tion approach proposed in this paper is useful in this 
proactive process. 
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APPENDIX 

Precision of Mean Exposure Estimates Based on 
Categorical Data 

The present calculation of group mean values of 
task proportions, w Tm , and postures at the opera- 
tion and task levels, \l E „ and |X £T ., respectively, 
is based on a mean-of-means approach (Samuels 
et al, 1985) in which exposure estimates are first 
obtained for each individual worker, and then aver- 
aged across workers to give the group mean. In a 
balanced data set containing an equal number of 
measurements for each worker, the variance, , 
of a group mean obtained by a mean-of-means 
approach in a finite (limited) group is 
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N. 



n' 



(1) 
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where n s , N, and JV* are the number of workers rep- 
resented in the data set, the total number of workers 
in the finite target population, and the total number 
of samples in the data set, respectively, and s\ s and 
Sw S are the between-worker and within-worker vari- 
ance components. If every worker in the target pop- 
ulation delivers data to the mean, n s equals JV, and 
the contribution from between-worker variability 
vanishes in a 'finite population' effect as discussed in 
previous papers (Mathiassen et al, 2003a; Liv et al, 
2011; Mathiassen et al, 2013). With the present simu- 
lation procedure, where one worker was randomly 
selected for each data point from the 10 workers 
eligible for observation as per the 'standard' PATH 
protocol (Buchholz et al, 1996), this 'saturation' will 
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more probably happen if the total number of meas- 
urements is 'large'. With N = 10, the probability of 
any one worker not being represented in the data set 
is only 1.9-10~ n % with a full set of 300 observations, 
whereas it is approximately 5% with 50 observations, 
as might occur at the task level under the 300-sample 
strategy. Thus, all 10 workers were likely represented 
in all simulated data sets at both the operation and task 
level, except for some data sets at the task level when 
using the smallest sampling strategies, in particular if 
some task or posture category occurred very rarely for 
some worker (s). 

Thus, in 'large' data sets, the group mean variance 
expressed by equation ( 1 ) is determined entirely by 
the uncertainty associated with estimating the mean 
exposures of individual workers within the group. 
Each basic unit of measurement in the present data, 
i.e., each single PATH observation, is a multinomial 
set of binary answers 'yes' or 'no' to whether posture 
and task at that very instant are within a particular cat- 
egory for the selected worker. If the probability of the 
answer 'yes' in a particular category is p, the expected 
variance between individual samples will, according to 
multinomial theory, bep( 100 -p), withp measured in 
percent, and the variance of a mean exposure estimate 
based on n samples is, therefore, p(l00 - p)/n. For 
a certain sample size n, this variance is largest when 
the probability, p, of a 'yes' is 50% and will decrease 
symmetrically toward zero at probabilities above and 
below 50%. 

Thus, the variance of a mean exposure estimate in 
a particular category for one worker is expected to 
bep(l00 - p)/n, where p is the exposure probability 
(in percent) and n is the number of samples for that 
worker. When individual workers' mean exposures 
are averaged to form a group mean, the contribution 
from within-worker variability to the resulting vari- 
ance on that mean is therefore 



N 



100 -n 



(2) 



for operation exposures and similarly 



100 



N -w Tm -n s{T) s(r) 
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for exposures in task T, where N* is the total number of 
samples at the operation level, cf. equation (l); nota- 
tion otherwise as in Tables 1 and 2. Both equations 
assume that the same number of samples is obtained 
from all workers, at the operation level, equation (2), as 
well as the task level, equation (3). For an unbalanced 
data set, i.e. one with unequal numbers of observations 
for different workers, the variance is larger, especially if 
the data are severely unbalanced (Samuels etal, 1985). 

The estimates of occurrence for each of the four 
tasks in the jacking pit construction operation 
( w t»; Table 4a and Supplementary Table SI a, avail- 
able at Annals of Occupational Hygiene online) are 
equivalent, from a statistical point of view, to the 
estimates of operation and task postures since all are 
based on observations having an answer of 'yes' or 
'no' to whether that particular unit represents each 
particular task or posture category. Thus, if all workers 
are represented in the data set, the variance of the esti- 
mated group mean proportion of any one of the four 
tasks can — for a balanced data set — be expressed as 



N 



-£[w rs (l00-w rs )] 



(4) 



If a 'large' number ofbinomially distributed samples 
are averaged, the mean value will be normally distrib- 
uted. Thus, prediction intervals for postures and task 
proportions can be calculated by using the variances 
in equations (2-4) in a standard assessment of lower 
and upper limits associated with a coverage according 
to choice, in casu the 5th and 95th percentile. 
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